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In making inference on the relation between failure and exposure histories in the Cox semipara- 
metric model, the maximum partial likelihood estimator (MPLE) of the finite dimensional odds 
parameter, and the Breslow estimator of the baseline survival function, are known to achieve full 
efficiency when data is available for all time on all cohort members, even when the covariates 
are time dependent. When cohort sizes become too large for the collection of complete data, 
sampling schemes such as nested case control sampling must be used and, under various models, 
there exist estimators based on the same information as the MPLE having smaller asymptotic 
variance. 

Though the MPLE is therefore not efficient under sampling in general, it approaches efficiency 
in highly stratified situations, or instances where the covariate values are increasingly less de- 
pendent upon the past, when the covariate distribution, not depending on the real parameter 
of interest, is unknown and there is no censoring. In particular, in such situations, when using 
the nested case control sampling design, both the MPLE and the Breslow estimator of the base- 
line survival function achieve the information lower bound both in the distributional and the 
minimax senses in the limit as the number of cohort members tends to infinity. 

Keywords: highly stratified; information bound; semi-parametric models 

1. Introduction 

For many epidemiologic studies, the cohort from which failures are observed is simply 
too large for the collection of full exposure data, and in order to make inference on the 
connection between exposure history and failure it becomes a matter of practical necessity 
to sample. For a cohort followed over time, one of the simplest sampling schemes, termed 
nested case control sampling [15], is to choose a fixed number of controls to compare to 
the failure at each failure time. Though it has previously been shown that the maximum 
partial likelihood estimator (MPLE) in the Cox semi-parametric model achieves full 
efficiency when data is available for all time on all cohort members, the same is no longer 
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true in certain situations when schemes such as nested case control samphng are used. 
In counterpoint to such cases, here we explore a model where the MPLE is efficient, 
in both the distributional and minimax senses, for the nested case control sampling 
scheme. Wc also show that similar remarks apply as well to the Breslow estimator of the 
baseline hazard. Knowing in which situations the MPLE is close to efficient provides some 
guidelines on when it may be applied with little risk of efficiency loss, and when other 
estimators, perhaps depending on additional modeling assumptions, should be considered 
as an alternative. 

In the standard Cox model [5], a common but unspecified baseline hazard function A(t) 
is assumed to apply to all cohort members. The relation between exposure and failure is 
the one of most interest, and is modeled by the real parameter 6 specifying the increased 
relative risk, having the exponential form e^^, say, for an individual with covariate Z. 
The unknown baseline is considered for the most part to be a nuisance parameter. When 
covariate information is available on all cohort members, the maximum partial likelihood 
estimator (MPLE) makes inference on the parametric component of such models by 
maximizing a 'partial likelihood', that is, the product of the conditional probabilities, 
over all failures ij, that individual ij failed given that the individuals TZi- were also at 
risk to fail when ij failed, 

eZi. 

Li^) = Ily (1) 

We note that the unspecified baseline hazard cancels upon forming this conditional prob- 
ability. 

When data is only available on some sampled subset TZi. of the entire cohort TZi. , an 
estimator may be formed by replacing TZi- by TZi - , (see [4]), possibly then mandating the 
use of weights so that the MPLE remains consistent. Nested case^control sampling, which 
does not require the use of such weights, is the instance where TZi- consists of the failure 
ij and m — 1 non-failed individuals to serve as controls, chosen uniformly at random for 
those at risk at the time of the failure. 

One price to pay for the ability to estimate 9 while leaving the nonparametric baseline 
hazard unspecified, and the subsequent use of the MPLE, is that it is not a true likelihood 
being maximized, and efficiency concerns arise. In particular, it is not clear whether one 
can construct estimators that depend on the same data as the MPLE but have better 
performance. In the paper of Begun et al. [1], however, these concerns are put to rest in 
the full cohort case where the covariates are time fixed, as the authors demonstrate that 
in that situation the MPLE achieves the semi-parametric efficiency bound. Greenwood 
and Wefelmeyer [10] show the MPLE is efficient in the full cohort situation even when 
the covariates are allowed to depend on time. Similar remarks also apply to the Breslow 
estimator of the baseline hazard. 

The situation is different under sampling: Robins et al. [14] has shown that for time 
fixed covariates the MPLE is not efficient under nested case control sampling. In this 
situation, there may exist modified estimators that take advantage of the time fixed 
nature of the covariates, in that the exposure for a control sampled in the past is still 
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valid at a future failure time. In time varying covariate models, Chen [6] among others, 
have modified the MPLE to yield consistent estimators of the parametric parameter that 
have smaller asymptotic variance than the MPLE. The estimator proposed in [6] uses 
covariates sampled for other failures at time points near to that of a given failure to take 
advantage of already available information. Here, to realize a practical efficiency benefit, 
the sequence of failure times must be sufficiently dense and the covariates must not be 
varying too rapidly in time. Though in the time fixed covariate situation the modified 
estimator uses information from the past specifically, in both cases one relies on the 
dependence of the covariate values over time to realize some efficiency gain; for the time 
varying covariate models, such modified estimators will perform better the stronger the 
time dependence. Due to the various improvements on the performance of the MPLE, 
it becomes less clear in just which ways its performance can be improved, or, in other 
words, whether the MPLE fails to be efficient for reasons in addition to the ones by which 
these modified estimators achieve their gains. 

Showing that there is some sense in which the MPLE for nested case control sampling 
is efficient is therefore valuable for two reasons. First, it limits the scope of the search for 
estimators that might improve the MPLE's performance. Second, it indicates the use of 
the simple MPLE, and not a more complex version of same, in situations that achieve or 
approximate those in which it cannot be improved. 

Based on the known instances where the MPLE fails to be efficient under sampling, 
to find models where it is, by contrast, efficient, we are led to consider situations where 
covariate information collected for one failure is not useful at any other failure time. 
Indeed, such situations arc fairly common in epidemiologic studies, in particular, when 
highly stratified cohorts are followed over a short period of time. Due to the short time 
under study, the covariates may be considered time fixed, and there is, for that same 
reason, little or no censoring. Last, in such cases, the groups corresponding to the terms in 
the product of the partial likelihood are independent, or very nearly so. A continuous time 
covariate model where the failures are spaced far apart relative to the correlation time 
of the covariates will also have the property that the covariate values at one failure time 
will be nearly independent of those at any other. In fact, in the limit, this latter situation 
becomes the former, highly stratified case. Thus we are led to a time fixed covariate 
model / having no censoring, where we observe n independent units of information, 
each consisting of the observed failure from a cohort of a possibly random number rj of 
individuals who are comparable to the failure, the covariate value of the failure, and the 
covariate values of m — 1 sampled controls. 

A concrete example of such a situation is the study of occupational exposure to elec- 
tromagnetic fields, or EMF and leukemia [11], which is fairly typical of cancer registry 
based case-control studies. The cohort is the adult male population in mid-Sweden fol- 
lowed over 1983-1987 for cases of leukemia. Two controls were sampled from risk sets 
based on the age of the 250 leukemia cases, matching on year of birth and geographic 
location. In this study, with the four-year follow-up and fine stratification, there is lit- 
tle censoring and almost all strata have at most one failure, thus the sampling model 
considered here very closely approximates the circumstances of the study. 



572 



L. Goldstein and H. Zhang 



It is easy to verify that in these situations, letting Z be the distribution of the i.i.d. 
covariates, under the null Oq — O the information — E[d^ log L{9)/d9^] =o'j^ple' where 

'^MPLE = (^) Var(Z), 

where L{9) is as in (1) with the set of those at risk TZi^ replaced by the nested case 
control sampled risk set TZi-. Hence, under regularity (see, e.g., [4, 7, 8]) the MPLE 0„ 
is asymptotically normal and satisfies 

' '''mple)- 

Our main result, Theorem 2.5, shows that when considering a growing cohort size, the 
limiting effective information in the data, /*(0o)i equals CTj^ple' ^^'^ that the MPLE is 
efficient in the limit in both the convolution lower bound and minimax senses. Theorem 
2.6 shows similar remarks apply to the Breslow estimator of the baseline survival function. 

When the complete set of covariate values is observed it is unimportant whether the 
covariate distribution is considered known or unknown. Again, the situation when sam- 
pling is different; knowing the covariate distribution allows one to estimate large sample 
quantities with some accuracy. Consequently, the hypothesis of Theorem 2.5 includes the 
assumption that the covariate distribution is unknown, and the subsequent analysis must 
therefore handle two infinite dimensional nuisance parameters, one for the unknown base- 
line density, the other for the unknown covariate distribution. In particular, the results 
leave open the possibility of improved estimators that take advantage of a known covari- 
ate distribution. Nevertheless, such improvements must necessarily depend on having 
information about, and correctly modeling, the covariate distribution, and consequently 
invite the possibility of bias due to modeling misspecification. 

We consider the Cox model under the usual exponential relative risk, though the meth- 
ods here may be applied for other relative risk forms, as was accomplished in [10] for the 
full cohort, time varying covariate model. The methods here also extend to accommodate 
censoring, though this generalization requires the inclusion of a third infinite dimensional 
parameter, the censoring density and consequently the handling of an additional operator 
corresponding to the unknown censoring density. 

The outline of this work is as follows: In Section 2.1 we review and slightly modify the 
theory in [1] for the calculation of information bounds in semi-parametric models to ac- 
commodate a pair of unknown densities. In Section 2.2 we further specialize that theory 
to the case at hand and formally state our model and the main results that were outlined 
above. Application of the theory presented in Section 2.1 for the relative risk parameter 
9 requires verification of three assumptions. The first. Assumption 2.1, is that certain 
collections of perturbations form a subspace. The second. Assumption 2.2, is connected 
to the Hellinger differentiability of the observation density /, in particular, that pertur- 
bations of the nonparametric baseline and covariate density affect / by amounts given 
by operators A and B evaluated on the respective perturbations, and that perturbing 
the parametric parameter results in a score po. The third. Assumption 2.3, is that the 
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orthogonal projection of the parametric score po is contained in a certain subspace, K. 
In order to proceed as quickly as possible to the calculation of the information bounds 
in Section 4, we present in Section 3 only a subset of the properties eventually required 
of the operators A and B and of the score po- 

The remaining properties required of A and B are shown in the Appendix in Sec- 
tions A.l and A. 2. An outline of the verification of Assumptions 2.1, 2.2 and 2.3 is given 
in Section A. 3; the detailed calculations can be found in the technical report [9]. Remarks 
on the modifications made to the theory in [1] that are necessary for our application can 
be found in Section A. 4. 

2. Information bounds for sampling in the Cox model 

In Section 2.1 we review and adapt the framework of [1] for the calculation of infor- 
mation bounds in semi-parametric models to the case where there are two unknown 
one-dimensional density functions. In Section 2.2 we specify the model / for nested case 
control sampling and formally state our main result showing that the MPLE, and the 
Breslow estimator, achieve their respective efficiency lower bounds. 

2.1. Information bounds in semi-parametric models 

This section closely follows the treatment in [1] for deriving lower bounds for estima- 
tion in semi-parametric models; see also the text [3]. Let L^{n) denote the collection of 
functions that are square integrable with respect to a measure /i, and for u,w G 
we let {u,v)fi = J uvd^ and = {u,u)i^i_. Here, as in [1], the data consists of n i.i.d. 
observations Xi, . . . , X„ taking values in a measurable space {X,J-'x), and the density 
function / of a single observation is with respect to a sigma-finitc measure a. We consider 
a model where the density / = f{-,6,g,h) is determined by a real parameter 9, the one 
of most interest, and by the infinite dimensional parameter p= (g,h), a vector of two 
unknown densities g and h, the baseline failure time density, and the marginal covariate 
density, respectively. 

Let 1?+ and 2? denote the collection of densities with respect to Lebesgue measure 
and ly on R"*" = [0, oo) and M, respectively. We let the parameter space G for the unknown 
baseline failure density be 



To impose growth conditions on the covariates similar to the ones typically assumed, for 
a covariate density h : M ^ [0, oo) and 6* G R let 



1+ 




where — — — e' 
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For some fixed C > and < 9^ < 9^^ we let the parameter space for the covariate density 
be 

n = {\\(:,V:Mu{9)<oo for aU 16*1 < 9,, and Mu{9{} + A/h(-%) < S}. 
Hence, the parameter space V for the pair p of unknown densities is given by 

r = g-K'H. 

Adopting sUghtly inconsistent notation for the sake of ease, we let denote the null 
parameter in R, and henceforth, g and h the null parameters in Q and 7i, respectively; 
we label them also as g^ and /ig when convenient. 

For r e M let 0(t) denote the collection of all real sequences {6'„}„>i such that 

\y/n{9n - 9q) - t\ ^ Q asri->oo and set 6 = |J{e(T) : t e R}. 

Let lie = L'^{v^) x i^(i^e) and for 7 = (a,/3) £ Hg let ||7||ne = max{||a||^+, ||/3|li/a}, the 
product metric, and, with p= {g,h) as the null parameter, let C(p, 7) be the collection 
of all sequences {p„}„>o = {(.?«, ^n)}n>o C V such that 

7lln.^0 asrwoo foraU |^^| <0„. (2) 
Let r be the set of all 7 such that (2) holds for some {pn}n>o C V, and 

C(p)=|JC(p,7). 

By considering the components of {pn}n>o we see that Ci{g,a) is the collection of all 
sequences {(7n}n>o in that satisfy 

IIV"(.9y^ -3^''^) -aL+ ^0 asn^oo, 

and therefore a G satisfies a _L g^^^ in that is, {a,g^/^),,+ ~ 0, or, 

/■OO 

/ 5i/2adi.+ = 0. (3) 
Jo 

Now let 

^= {a e : there cxists{5„}„>o C such that || V^lff^^ ^ 5^^^) - aL+ ^ 0} 

and set 

Ciig) = y Ci{g,a). 

Similarly, C2{h,(3) is the collection of all sequences {hn}n>Q in such that 

WV^ihl/^ -h^/^)- ^0 as n^oo for aU |6i| <6'«. (4) 
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For 6* = (4) yields 

\\V^i{hl/^ -h^/^)-/3\\^^0 asn^oo, (5) 
and therefore that /3 satisfies 

h^^^f3diy = 0. (6) 



Now let B be the collection of all /3 e L^{i^) such that there exists {hn}n>o C H such that 

WV^ih]/' - /ji/') - ^ for all \e\ < e^, 

and set 

C2{h)^\JC2{h,P). 

Clearly 

C(p,7)=Ci(5,a) xC2(/i,/5), C(p) = Ci(g) x C2(/i) and T = AxB. 

The following three assumptions will be needed to demonstrate Theorems 2.1 and 2.2, 
and, in addition, the fourth will be needed for Theorems 2.3 and 2.4. The first is that F 
is a subspace of L'^{i'~^) x Li^iv) or. cquivalently. 

Assumption 2.1. The sets A and B are subspaces of L'^(v'^) andL'^(v), respectively. 

It is shown in [1] that parts of the following assumption arc a consequence of the 
Hcllingcr differentiability of /; we verify Assumption 2.2 directly. 

Assumption 2.2. There exists pg G L?{cr) and linear operators i^(o') and 

B : L'^iiy) i^(o') such that for any {T,a, (3) €R x A x B and 

i{On}n>oA9n}n>0,{hn}n>o) £ Q{t) X Ci{g,a) X C2(/l,/?), (7) 

the sequence of densities given by /« = /(■, ^n, ffn, /in) for n = 0, 1, . . . satisfies 

llV^(/y'-/o^')-CL^O forC^rpe+Aa + Bp asn^^. (8) 

Let 

H = {C e L^(cr) : C = Tpg +Aa + BP for some t eR,a e A, 13 e B} (9) 

and 

IK = {<5 e L^(ct) ■.6 = Aa + BP for some a G ^ and /3 G B}. (10) 



576 



L. Goldstein and H. Zhang 



The classical projection theorem shows that the orthogonal projection of pe onto the 
closure of K is an element of the closure of K. However, we consider situations satisfying 
the following assumption, that is, where IK itself contains the projection of pg. 

Assumption 2.3. There exists a ^ A and f3 €z B such that 5 ~ Aa + Bp satisfies 

P0- S ±S for all (5 e K. 

Since for any S = Aa + Bf3 E K, by orthogonality, 

\\pe-d\\l = \\pe-Aa-Bp\\l 

= \\pg-S-A{a-a)-B{P-ml 
= \\pg-6\\l + \\A{a-a) + B{P-ml 

>\\PB-S\\l, 

hence 5 minimizes Upe — over 5 G K, and thus corresponds to the worst case direction 
of approach to the null, that is, the one that minimizes the available information. Set the 
effective information to be 

h^i\\pg-6\\l. (11) 

For C G let J^if, C) be the collection of all sequences {/n}n>o such that (8) holds, and 
T{f) the union of T{f, C) over all C € H. We say that an estimator 6'„ of do is regular at 
/ = /(•, 6*0,3, if for every sequence /„(•, 6'„, g„, /i„) with {6'„}„>o, {g,i}„>o and {h„}n>o 
as in (7), the distribution of \/n{dn — do) converges in distribution to £ = C{f), which 
depends on / but not on the particular sequence /„. 

The setup above differs in two ways from that in [1]. First, the model considered 
here has two nonparametric components, g and h, while in [1] only one nonparametric 
component is considered. Second, as we specify the parameter space Ti. on the covariate 
density h in such a way as to accommodate more relaxed integrability conditions, the 
resulting space of perturbations B is expressed as the intersection of subspaces (see [9]), 
one for each 9 in (—9,^,6^). This is so as the perturbations f3 are required to be limiting 
approximations to ^/n.{hn^ — /i^/^) in L'^{ve) for all \9\ < 9^,, rather than in L'^{v)- As 
(4) implies (5) our condition gives rise to a smaller collection B of perturbations than in 
[1]. Nevertheless, only minimal adaptations of the proofs of Theorems 3.1 and 3.2 and 
Theorems 4.1 and 4.2 of [1] are required to demonstrate Theorems 2.1-2.4 for our model, 
so these arc relegated to Section A. 4. 

Theorem 2.1. Suppose that 9n is a regular estimator of 9o in the model f = f{-,9,g,h) 
with limit law C = C{f) and that assumptions 2.1-2.3 hold. Then C is the convolution 
of a normal J\f{0, 1//*) distribution with a distribution depending only on f , where /* is 
given by (11 ). 
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We may also adapt the asymptotic minimax result of [1]. Recall that we say a loss 
function £ : R — > R"*" is subconvex when {x : £(x) < y} is closed, convex and symmetric for 
every y > 0. We will also assume our loss function satisfies 

/oo 
e{z)(l){az) dz < oo for aU a > 0, (12) 
-oo 

where (j) denotes the standard normal density function. 

Theorem 2.2. Suppose Assumptions 2.1-2.3 hold and that £ is subconvex and satisfies 
(12). Forc>0 let 

Bnic) = {/„ e ^: V^ll/y^ - f^X < c}- (13) 

Then 

lim lim inf sup Ef i{y^{§„ - e„)) > Ee{Z,), (14) 

c^oori^oo f„eB„{c) 

where Z» ~ A/'(0, 1//*) and E is given by (11). 

The infimum in (14) is taken over the class of "generalized procedures," the closure of 
the class of randomized Markov kernel procedures (see [13], page 235). We also obtain 
lower bounds on the performance of regular estimators of the baseline survival function 
G(-) by similarly adapting Theorems 4.1 and 4.2 of [1] under the following assumption. 

Assumption 2.4- The linear operator A* A: L'^{v'^) L^(zy+) is invertible with 
bounded inverse {A*A)~^. 

We also suppose that, perhaps by a suitable map such as the probability integral 
transformation, the density g is supported on [0, 1]. Let 

Gs = (/[o..]-G(s)).g(s)i/2, 
and define the covariance functions 

K{s,t) = {Gs,{A*A)-^Gt),-v and K,{s,t)=K{s,t)+4I~^ f ag'/^ f ag'/^, (15) 

Jo Jo 

where 7* is given by (11) and a is as in Assumption 2.3. For the precise definition of a 
regular estimator of G(-), analogous to that for estimators of 6*0, sec [1]. 

Theorem 2.3. Suppose that G(-)„ is a regular estimator of G{-) = J^gdv^ in the model 
f = /(•; 0, g, h) with limit process S, that Assumptions 2.2-2.4 hold, and that Assumption 
2.1 holds with A given by {a eL\v+): J ag^/"^ Av+ = 0}. Then 



S=d +W, 
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where is a mean zero Gaussian process with covariance function K^{s, t) given by ( 15) 
and the process W is independent of . 

For the local asymptotic minimax bound, we let I: C[0, 1] K"*" be a subconvex loss 
function, such as ^{x) = sup^ \x{t)\,i{x) = / di, or t{x) = l(a; : ||a;|| > c). 

Theorem 2.4. Suppose the hypotheses of Theorem 2.3 are satisfied, that I is subconvex, 
and that Bn{c) is as in (13). Then 



where Z* is the mean zero Gaussian process with covariance K^{s,t) given by (15). 

The infimum over estimators G{-)^ in (16) is taken over the class of "generalized 
procedures" as in [13], page 235. The proofs of Theorems 2.1-2.4 in the Appendix detail 
the modifications required for the application of the methods of [1] to the case at hand. 

2.2. Main results 

We now specify our model / for the nested case control sampling of jti — 1 controls for 
the failure in each group. For any integer fc, let [k] = {1, . . . , A;}, and for any set S let 
VkiS) be the collection of all subsets of S of size k. Groups of individuals of size r]>m 
are observed up to the time of the first failure, at which point covariatcs are collected on 
a simple random sample of to — 1 non-failed individuals and the failure. 

An observation X = {ri,i,r,t, Zr) consists of the group size r], the identity i G [rj] of the 
failed individual, the group r C [rj\ of the to individuals whose covariates are collected, 
the time t of the failure, and the covariatcs z^ — {zj,j G /'}. In particular, X takes values 
in the space 



which we endow with the c-finite product measure 

(T = (counting measure) x (counting measure) x (counting measure) x i/"*" x v"^ . 

To begin the specification of the density / of the observations, corresponding to the 
baseline survival density g on arc the baseline survival and hazard functions, for 
t > 0, given by, respectively 



c- 



lim lim inf sup Ef e{^/^{G{-),^- Gn)) > Ei{Z^), 
=^°°"^°°Gr),./'.es,.(c) 



(16) 



X=\J{r,x M X P„(M) X M+ X M"}, 




for G{t) > 0, 



otherwise. 



Efficiency of the MPLE for sampling 



579 



Under the assumed standard exponential relative risk form, the hazard function A(t; z) for 
an individual with covariate value z is the baseline hazard scaled by the factor ex-piOz), 
that is, \{t; z) = exp(0z)A(t), resulting in survival and density functions, respectively, of 

Ge{t-z)^G'''' {t) and z) = | e^^ffW^" '\t), for G(t) > 0, 

y 0, otherwise; 

we note go{t;z) ~ gg{t;0) = git). As the marginal covariate density is /i, the survival 
function Go{t;z) averaged over individuals with covariate density h(z) results in the 
(mixture) survival function 

Ge{t)^ jGe{t]z)h{z)dz 

for individuals whose covariates are not observed. 

The group size 77 may vary from strata to strata, and we assume it to be random with 
distribution, say, g. At the time t of the failure of individual i, a simple random sample 
of size m — 1 is taken from the non-failures to serve as controls. Hence, when the group 
size is rj and the identity of the failure i, the probability that the set r C [rj\ is selected is 
given by 

— 1 y 

for any set r of size m containing i. We assume that the individuals in [77] are independent, 
and therefore the density of the sampled covariates Zr is the product 

Putting all the factors together, the density for X = {ri,i,r,t, Zr) is given by 



fiX;e,g,h) ^ K^^^c''^git)G{t)^^^y'' 'Ggit)^-"'h{zr)g{v) 



(17) 



n G'(i;^,) 

.i£r\{i} 



Ge{tf-"'h{zr)Q{Ti). 



For the sake of clarity or brevity, the density may be written with either its parameters 
or its variables suppressed, that is, as f(j],i,r,t, Zr) or f{9,g, h), respectively. At the null, 
(17) reduces to 



f{X;eo,g,h) = K\^g{t)G{tr-^h{zr)g{rj) 



(18) 



which, in agreement with the notation introduced in Section 2.1, may appear in the 
abbreviated form /q. We may take the distribution of 77 as known when proving The- 
orem 2.5 since the MPLE is computed without knowledge of g and already achieves the 
bound (20) in the limit. 
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We are now ready to state our main result regarding the estimation of the parametric 
component of the modeL 

Theorem 2.5. Suppose that rj > 2 almost surely, E[ri^] < ao and at least one of the 
following conditions is satisfied: 

(i) Positivity: The parameter space Q = [0,oo), the covariates Z take on non-negative 
values, and rj > m almost surely. 

(ii) Boundedness: The covariates Z are bounded and rj>m almost surely. 

(iii) Cohort size: 1 < to < 77 — 4 almost surely. 

Then Theorems 2.1 and 2.2 obtain for the nested case control model given in (17) with 
effective information 



I^iOo) = Var(Z) ( 1 - + mVai-(Z) (^2 VarQ 



1] 



(19) 



In particular, under any of the above three scenarios, if Qn is a sequence of distributions 
such that r]„ —^p 00 when rjn has distribution Qn, then 



hiOo) - hm /^(0o) - Var(Z) f^^^] 



(20) 



and hence the Cox MPLE is efficient for the limiting nested case control model. 



The situation where there is full cohort information is covered by the special case 
P{r] = to) = 1, for which (19) reduces to the lower bound Var(Z), recovering the result 
of [1] for the case of no censoring. See Section A. 3 for some remarks on the rationale 
behind the three conditions in Theorem 2.5. 

Next, we consider lower bounds for the estimation of the nonparametric component 
of the model. It is shown in [8] that the Breslow estimator of the baseline survival is 
asymptotically normal with covariance function 

a;(.,i) = G(i)G(s)^'^* j^^J^^^^^,^ + [i?(^)]'(logG(i)logG(,s))[/.(0„)]-i) , (21) 

where I*{9o) is given in (20). 

Theorem 2.6. Let the hypotheses of Theorem 2.5 be satisfied. Then on any interval 
[0,To] for which G(To) > 0, the conclusions of Theorems 2.3 and 2.4 hold with 

K4s, t) ^ G{t)G{s) E[r,-G{!i)^+^] + logG(.))[4^(0o)]-^) • (22) 

By (20) and (21), we see that the Breslow estimator becomes asymptotically efficient 
as the cohort size increases under the nested case control model considered. 
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Theorem 2.5 follows from Theorems 2.1 and 2.2. The application of these theorems is 
a consequence of Theorem 4.1, which provides the effective information /|(0o), and the 
verification of Assumptions 2.1-2.3. In [9], a simple argument shows that Assumption 2.1 
is satisfied with A and B given by (40). The verification of Assumption 2.2 is somewhat 
involved. The relevant quantities, A,B,a,(3 and po, are given in (24), (25), Lemmas 3.2, 
3.4 and (23), respectively. The remainder of the verification of Assumption 2.2, that is, 
the convergence to zero in (8), is shown in Lemma 3.1 whose proof is deferred to [9]. 
Assumption 2.3 follows in a fairly straightforward manner from (40). Some remarks on 
the calculations in [9] can be found in Section A. 3. 

Theorem 2.6 follows similarly from Theorems 2.3 and 2.4. In addition to Assumptions 
2.1-2.3, the application of these theorems follow from Theorem 4.2, which verifies the 
covariancc lower bound (22); Lemma A. 2, from which Assumption 2.4 on [0,To] follows 
easily; and (40), which shows that A is of the form required by Theorem 2.3. Regarding 
the restriction of the result to [0,To], see example 4 in [1], page 450 in particular, and 
the proof of Lemma 2 in [16]. 

3. Operators A and B: properties 

The following lemma provides the parametric score po and the operators A and B re- 
quired by Assumption 2.2 and needed for the computation of the effective information 
/* in (11). Sums over r denote a sum over all r C [rj] of size m, and sums over ?7,«,r are 
short for the sum over all rj G Z+, i G [77] and r C [rj] of size m with r3i. 

Lemma 3.1. Assumption 2.2 is satisfied for the nested case control model (11) with 



Po = - z, + logGit)Y,i^i - + vEZ\ogG{t) f^ 



(23) 



Aa^[g-ntMt)+^-Il^^l^f^)fy^ 



(24) 



and 




(25) 



Lemma 3.1 is proved in [9]. 



3.1. A operator: properties 



Regarding the definition and calculation of adjoint operators such as A* in the follow- 
ing lemma, the reader is referred to [12]. The proof of the following lemma appears in 
Section A.l. 
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Lemma 3.2. Let po o,nd A be given by (23) and (24), respectively. Then the function 

a = E^[l + logG{t)]g"'{t) 

is the solution to the normal equation A*Aa = A* pq and the projection of po onto the 
range of A is given by 

Aa^^[l + fj\ogGit)]f^/\ (26) 
3.2. B operator: properties 

Let r C [rj] of size m be fixed. For s d r let Zg = {zj : j e s} and z^s = {zj : j € r\ s} and 
denote integration over Zg and z^^ with respect to the measures v^'^^ and by dz^ 

and dz^si respectively. When s — {j}, we identify that jth variable Zj with z. Litcgration 
with respect to is often indicated by dt, but may also be indicated by other notations 
such as du, or suppressed, when clear from context. 

Lemma 3.3. The adjoint B* ■.L'^{a) L'^{v) of the operator B in (25) is given by 

B*fi^h-'^'iz) j r fl''-p.dtdz^,. (27) 

Proof. As B = Y^jer with 

B.p = h-^'\z,)fl'^P{z,) for /3 e L\v), 

by linearity one need only sum the adjoints B* of Bj over j €i r to obtain B* . For 
/i G L'^{(j), the calculation 

{Bjl3,p)„= / BjPuda 

f r h-^'\z,)fl'^P{z,)pdtdZr 
= j m (h-^/^z) T.J r /d^Vdtdz., ] dz 

provides the desired conclusion. □ 
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The proof of the foUowing lemma appears in Section A. 2. 
Lemma 3.4. The function 

'' rj ~ m 



583 



7717] 



{z - EZ) 



(28) 



is the solution to the normal equation B* B[3 — B* Pq, and the projection of po onto the 
range of B is given by 



7] — 771 
7717] 



Y,{z,^EZ)]fl 



1/2 



(29) 



4. Lower bound calculations 

We begin the computation of the information bound by showing that the two operators 
A and B have orthogonal ranges. 

Lemma 4.1. Let A and B be the operators given by (24) and (25), respectively. Then 

B*A = and A*B = 0. 
Proof. Since {A*B)* = B*A it suffices to prove only the first claim. By (24) and (27), 



B*Aa = B* [ g-y^t)ait) + ^^^^^^^^ fl'^ 



G{t) 



h-^'\z)Y,K,,,,,e{7^) 

n 

J2 I h{zr) £ [g-''-\t)a{t) 



ir]-l)jr9'^'c 
G{t) 



fodtdz^j 



)9{t)G (t)dtdz^j 



Integrating the inner integral by parts, 
/o \ G{t) 



roc \ 

9-"\t)a{t) + {r]- I) -^*_^ '" y{t)G'-\t)dt 
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/>oo / poo \ 

1^ ^g^/\t)G'-\t)a{£) + {ri-l)g{t)^-\t) gi/^aj dt 



g^/^it)G'^ \t)a{t)dt- 
g^^\t)W\t)a{t)dt- 



J 



gi/2(t)G" \t)ait)dt 



which equals zero by (3). 



□ 



The perpendicularity relation that holds between A and B allows for the application 
of the following lemma, which simplifies the calculation of the information bound. 

Lemma 4.2. Let IK be given by (9). Then under the perpendicularity relations provided 
by Lemma 4-1, the function 

6 = Aa + Bf3 minimizes \\pQ — S\\^ over (5 G K, 

where a and (3 are the solutions to the normal equations A* Aa = A*po o.'^id B* 3(3 = 
B*po, respectively. Consequently, the effective information (11) is given by 



/*(^o)=4||po-Aq-B/3||, 



(30) 



Proof. Since A*B = we have A* = A* Aa = A*5 and similarly B* p^ = B*Bl3 = B*6. 
Therefore {A + B)* p^ = (A + B)*5, or [A + B)*{po -d) = 0. Hence we have 

Pq~S ±K and d e K, 

showing d is the claimed minimizer. □ 

We pause to record a simple calculation that will be used frequently in what follows. 

Lemma 4.3. Let s{t) be any density on and S{t) the corresponding survival function. 
Then for all integers rj and k satisfying rj > k, and j — 1,2, . . . , 

sit) s{ty>~'' [log s{t)y dt = i-iyiv -k + ji. 

In particular, as logS'(t) < for all t G M^" . if k and j are fixed, then for any constant 
C > 1 there exists rjc such that 



sit) S (t)''-'' \log S {t)\' dt< 



Cjl 



for all r]>r]c- 
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Proof. Rewriting the integral and then applying the change of variables u = S{t)^ 
followed by u = e~^ we have 



/•OO POO 

/ s{t)S{t)'^~''[\ogS(t)Y dt = (77 - A: + / s{t)S{t)'<-''[\ogS(tf-''+^] 
Jo Jo 



dt 



= (-1)^(77 -fc + i)-(^+i)r(i + i) 
= {-iy{7i-k + i)-^^+^^j\. 

Taking absolute value and noting that (77 — /c + 1 ) /r; ^ 1 suffices to prove the final claim. □ 

Theorem 4.1. The effective information for the nested case control model (17) is given 
by (19). 

Proof. Substituting (23), (26) and (29) into (30) we obtain 



{z,~EZ) + \ogG{t)Y,{zo'EZ)-E ^^-^ Y,izj-EZ) 



{zi - EZ) + Y^{zj - EZ) ( log G(t) - E 



rjm 

r] — m 
rjm 



jer 



fo 



1/2 



„l/2 



l + logG(i) --B 



77 — 771 



7^771 



(z. - EZ) 



J2 {z,~EZ)[logG{t)-E 

jer\{i} 



rj — m 
rjm 



rl/2 



which, by the independence of Zi and {Zj,j £ r\ {i}}, equals 



l + logG(t)-£; 



7/ — 777 



7^777, 



{Z^ - EZ) 



fo 



1/2 



J2 izj-EZ)(\ogGit)-E 

jGr\{»} ^ 



rj ~ m 



fo 



1/2 



Squaring and integrating against the null density (18) we obtain 



X 



l + logG(t)-E 



T] — m 
rim 



{Z^-EZ)' 
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J2 [\ogG{t)~E 

jer\{i} 



•q — m 
rjrn 



{z, - EZf 



foda 



E 



l + logG{t)-E 



rj — m 
rjm 



m.-l)[logG(t) - E 



rj — m 
rjin 



g{t)G" \t)dt 



Jo 



:YaTiZ)J2eiv) 



l + logG{t)-E 



■q — m 



n \ 2 



+ {m-l)[\QgG{t)~E 



rj — m 
rjm 



2l 



r]git)G'' \t)dt 



l + 2\^\ogGit)~ E 
+ m(\ogG{t)-E 



rj — m 
rjm 



T] — m 



rjm 



vg{t)G'' \t)dt 



: Yar {Z)E 





7] — m 






) 




rjm 





( 2 1 


rj — m 




rj — m 




m ^ + 2-E 


m 




rim 




rjm 





by applying Lemma 4.3. Simplifying we obtain 

ie{eo) = Var(Z) - + Var(Z) (^2 VarQ 



E 



which is (19). 

We now calculate the lower bound for the estimation of the baseline survival. 



□ 



Theorem 4.2. The covariance function K^{s, t) in (15) specializes to (22) for the nested 
case control model (17). 

Proof. Lemma A.2 shows that A* A is given by (36) with Mo{t) = E[rjG'^{t)]. Now (6.8) 
of [1] yields 



K{s,t)^Git)G{s) 



dG 



Mo{u)G{u) 



=—^Git)Gis) 



dG 



E[rjGiuy^+^ 
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Regarding the integral in (15), using the form a given in Lemma 3.2, we have 



Appendix 

In the following four sections of this Appendix we prove Lemmas 3.2 and 3.4 and pro- 
vide some remarks regarding the verification of Assumptions 2.1-2.3, and the proofs of 
Theorems 2.1-2.4. 

A.l. A operator 

In this section we provide the proof to Lemma 3.2. We begin by calculating the adjoint 
A* : L'^{a) L'^{v+) of the operator A given in Lemma 3.1. 

Lemma A.l. For the operator A:L^{v^) L^{a) given in (24), write 





G(t) 



Substitution into (15) now yields (22). 



□ 



A^Ai+A2 



where 



Aia = g 



^^^/o^^ct and A2a 



Git) 



a 



/( 



1/2 



(31) 



Then the adjoint of A is given by A* = A^ + A2, where 




and 



1/2 



(32) 




Proof. Let a G Li^{v^) and /i e 



(cr). Then 



{A^a,^J), = {g-'/^fl 
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a, 



when A\ is as given in (32). 
Next, writing A2 as 



A2a = Ll g^/^a for L = (ry - 1)G "^(^)^^'^ 



we have 



= J a{t)lg^/^{t)Y^ J J Lfidzrdujdt 

y ri,i,r ^ J 

A*2^i^g^'^{t)Y^ I I L^ldZrdu. 

Substituting L from (33) now yields the stated conclusion. 



when 



(33) 



□ 



To help express the solution to the normal equations in A, for a £ L'^{v^) define the 
operator R as in [1] by the first equality in 



Ra = g~'/'it)a{t) - = g-''\t)a{t) + M^; 

the second equality follows from (3). Also, set 

A/o(t) = E['qG'^{t)] and Mi{t) = E[Z]E[fiG{t)'^]. 
Lemma A. 2. Let the operator A be given by (24)- Then, for a e L'^iv'^) 



A*Aa^ 



mt) 



Ra{t)^^- / Ra=^ = 



Mn dG 



Git) Jo Giu) G{u) 



7I/2 



(34) 



(35) 



(36) 
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{A*Ay^c 



r. , N Git) f* ^ G(u) dG 
Ra{t)^^- I Ra- ' 



Moit) Jo ^^o(") G{u) 



,1/2 
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Proof. Using the decompositions A = Ai + A2 and A* = Al + A2 given in Lemma A.l, 
write 

Aa = fj,i+^2, where fii = Aia,i ~ 1,2, 

so that 

A*Aa ^ {Al + A'^){Ai + ^2)" = ^^Mi + ^^^2 + ^2^1 + ^2^2- 
Consider ^^/ii. From (31) and (32), 

Al^i^^g-'/'it)J2 J fo9~'^'c,dzr 
= g~\t)a{t)J2 J fodzr 

= g-\t)a{t)^ q{ij) I K^.^g{t)G^~\t)h{zr)dzr 



= a{t)J2eiv) 

V 

= ait)J28iv) 



G^ ^{t)Kr,,„i^ / h{Zr)dZr 



a(t)i?[r,G'' \t)]. 



In a similar fashion. 



Al^,2 = Y.(v-l)9-'/'it)J2 I /' 



G(t) 



(r,-l).g-i/2(0A',,™^ / g(t)G''"'(t) / g^/^a /i(z,) dz. 



■E 



g^'^aK,,,nY. j h{Zr)dZr 



r,{v-l)g'^'{t)G'-\t) I .9I/2 



Mv-l)9'/\t)G''~\t) I g'/'a 
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(77-l)gi/2(i) f g'/^G^ 'adui^„,,„E / Hz,.) dzr 



E 



ij{ij~l)g''\t) / g^/^Wadu 



and last, 

^;/^2 = E(^-i)V/^wE 



/o 



j^^^adzr du 



G{uY Ju 



E 



-?/(?/ / 5(^)6" ^(u) / g^^^{v)a{v)dvdu 



r]{r]-lfg^/\t) I g^^\v)a{v) I g{u)G" ^{u)dudv 



E 



V-^ Jo 



E 



a- I g'/'G"-'a 



''^^9^'M){G^-\t)j\^'^ 
Combining terms, we arrive at 
A> = E Li/2 (g-y^G"-\ - iv - 1)G""' / ' g'^'a + (7? - 1) / ' g'/^G'~' ■ 



■ E 



rjg'/^g-'^'-^'-'a-irj-l) 
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Recalling Ra from (34), we may write 



-1/2 VG f 7]-l\ f tjC 

g I a- ' ' ' ' 



G 



g ' a^^^- 

G G 



.T^ i-t 



Ra 



riG' 1 r^G 



G G ?? - 2 Jo 



_,/9 riG AG 
q ^'-^a-^ — 
G G 



Rewriting the third term using 



' ,-^'^iu)aiu)€^B . f (Rain) + ^Z^!^] ^^M^, 
' ' ^ ' G{u) gJo\ G J G{u) G 



we find 



A*fi^g^/'-E 



Ra(t) 



vG\t) 



Git) 
1 fvG" "* 



Ra{u 



f^G^ju) dG 
' G{u) 'W 

/•* rjG"dG 



\J u 



.1/2, 



-('?-2) 

But now we see that the term on second line of (38) vanishes, since 

\ ?/G'' dG 



G G 
vG" dG 

g 



't / POO 



g' g 









+v f 




"'0 



^'-2 1/2, 
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(38) 



-7]G 



g'/'a+ I g 



-rj-G"-' [ g'/'a- 



VG' 



G "'0 



i/^aryG"-' 

f g-'l^mjG'-^dG 
Jo 



* r^G" dG 



G G 



Hence, A*Aa is given by the first line of (38), and taking the expectation inside the 
integral completes the proof of (36). 

Finally, as A* A is of the form (36), the form (37) of the inverse follows as in [1], page 
449. □ 
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We are now in position to prove Lemma 3.2, giving the solution a to the normal 
equations A* Aa = A* p^, and the projection of po onto the range of A. 



Proof of Lemma 3.2. With po as in (23), we first claim 



A*p,^^g''\t)E 



7/ — 1 



From (32) we obtain directly that 



Alpo = ^g'/'imrjil + rj\ogGit))G''~\t)] 



and 



A;po^^g'/\t)E 



EZ 



g^'\t)E 



Tj ((l + r;logG(«))(77-l)g(w)G' {u))du 



^ G" '(u) - 77 log G(w)G^ \u) 



^9"\t)E 



77-1 

1 



77 — 1 — 1 



^ G" ' (i) log G(OG'' \t) 



and adding these two contributions yields the result (39). 
From (34) and (39) 



R{A*p,) = ^E 



T] — 1 



r^G{tyi-^ - 1 - = 



1 



EZ 



-E 



77-1 



G{t) 
1 

W) 



[77G'' ^-l]dG 



' r^Git)-^-^ -l + ^{-G\t) + G{t)) 



^E[,Gi^y-^]^'-m. 

2 ^' ^ ' ^ 2 G{t) ' 



(39) 



where Mi(t) = £'[Z]£:[77G(t)''] in accordance with (35). 

Hence, by (37), the solution a to the normal equations A* Aa = A* p^) is given by 



a = {A*Ar^A*pa{t) 



E{Z)-l E{Z)^ 
E{Z) 



1 \ Mi{t) /•* Mi(s) dG 

2 [mo(<) "io Mo(s)^ 
dG 



G(.s 

l + logG(t)]5'/'(i), 
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where we have used AIi{t)/Mo{t) = EZ. To calculate the projection Aa of po onto the 
range of A, note 

a{s)g'/'{s) ds = ^ r{l + logG(s)) dG{s) = ^G{t) logG(i), 



and hence 
Aa 



G{t) 



[l + 77logG(t)]/o 



1/2 



□ 



A. 2. B operator 

In this section we prove Lemma 3.4, providing the solution to the normal equations for 
the operator B. Parallel to Section A.l, we begin by deriving an expression for B*B. 

Lemma A. 3. Let the operator B be given by (25). Then 

B*Bi3^m(i{z). 

Proof. Applying formulas (27), (25) and (18), 

B*Bp=h-'/\z) y: I r 4t.' 



]h-'/\z,,)p{zk)]dtdz- 



r g{t)G{tr-ut f nM^o(E'^"'^'(^'=)/5(^'^)V^-^- 

i,r,jer'^^ ler \ker / 



h-^'\z)E 



7jg{t)G{t)"-' dt 



je[m]le[m] \ke[m] 



^ E / n '^(^') ( E h-'/'{z,)f3{z,)\ dz^, 

mh-^/^z) I n ^^^i) { E h-^'^i^k)P{zk)\ d^.i, 
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where the third equahty is by symmetry, and the last by recalUng that zi and z are 
identified in the integral over z^i. Hence 

/ m / m \ 

l[h{zi)iY,h~'^\zk)Pizk)]dz^, 
-1 1=2 \k=l J 

n h{zi) h-^'\z)p{z) + h-^'\zk)p{zk) Az^i 
[] h{zi) dz^i + mh}l^{z) / J] h{z{) h~^/\zk)P{zk) dz.i. 

1=2 "^^-1 1=2 \k=2 / 

As h{zi) is a density, the first term integrates to m(3{z). For the second term, 
l[h{zi)Yh~'^\^k)P{zk)dz^i 

— 1 1=2 k=2 

m r, / ra \ 

fc=2-^^-i \i^{l,fe} / 
m „ / m \ „ 

= E / n M-^O dz^i,fc / /ii/'(zfc)/3(zfc)dzfe 



fc=2 " 





by (6), showing B*Bj3 = mP{z) and the lemma. □ 

Proof of Lemma 3.4. From (27) and (23), arguing as in the proof of Lemma A. 3 and 
applying Lemma 4.3, wc obtain 

^*P° = l E / r foh-^^'-{z,)lz, + logGit)Yi^k-EZ)+7^logGit)Ez\dtdz^, 

X |2i+logG(0 Y {zk-EZ)^Ti\ogG{t)Ez\dtdz^j 

\ fee [ml / 
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^\Y. j h-'/\z,)hizr)E zi^-J2(^k- EZ) - EZ 

For J = 1 we obtain {l/2)h^/'^{z)E[{ri — l)/rj\{z — EZ) from the first term in parentheses, 
while each term in the second smu integrates to zero. For each of the m — 1 terms, where 
j ^ 1 the first term in parentheses integrates to zero, but when k = j one term in the 
sum in the second term makes a non-zero contribution oi -{\/2)h^/^{z){z- EZ)E[\/r]], 
for a total of 



B*po^]^h^'^{z){z~EZ)E 



?y — 1 m — 1 



\h^'^{z){z^EZ)E 



From Lemma A. 3 we clearly have 



{B*B)-^[3^^(i hence (3 ^ {B* B)-^ B* po ^ ^h^/^{z){z - EZ)E 



1] — m 



J] — m 



m 



rjm 



proving (28). Applying B as in (25) to (3 now yields (29). 



□ 



A. 3. Verification of Assumptions 2.1—2.3 

In this section we provide a basic outline of the verifications of Assumptions 2.1-2.3 given 
in detail in the technical report [9]. In particular, it is shown there by a simple argument 
that Assumption 2.1 is satisfied with 

^ = {aeL2(i^+):(a,gi/2)^+ =0} and B= Q {p e L'^ive) : {pX''^)u =Q)- (40) 

\e\<6^. 

Given the quantities A,B,a/(3 and po in (24), (25), Lemmas 3.2, 3.4 and (23), re- 
spectively, it is lengthy to verify the remainder of Assumption 2.2, that is, the required 
convergence (8). One main point of the detailed verification given in [9] is that, with 

G„,e(t;z)=C''(i), GnAt)^ jGnAt;z)hn{z)dz and G„(0 = G„,o(i), 
each of the three conditions given in Theorem 2.5 implies that 

^ _ f i^-m \ E,,[Zc^^Gn{tf'] 

is uniformly bounded. 
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The positivity condition is the one that appears in [3] . Under positivity z > and 
0>0, the function ze^^ is increasing in z and Gn(ty is decreasing in z. Therefore these 
functions are negatively correlated and we have 

E^Zc'^'Gnitf] < E^[Zc'>^]Er,\Gn{tf']=En[Zc'^]GnAi). 

and in particular 

\Cn\<\E4Ze'% 

which is a bounded sequence in n. Under the bounded covariate condition with, say, 
\Z\ < zo almost surely, we have 

,^ , 1 EJlZc^^lGnitT"'] 1 IflU 1 fl. 

Under the cohort size condition one shows that 

Cn^('-^)E„[Ze^-G..itr]( '''' 



and since \E4Ze'^^Gn{ty ]| < En[\Z\e^^] is bounded in n, and Krj.m/ Kr]-2.m < 1, again 
we find the constant Gn to be uniformly bounded. 

Regarding Assumption 2.3, by (40) to show that, say, a G A, it suffices to verify that 
a G L'^{v~^) and (d, g^^'^)v+ ~ 0. The first claim follows from Lemma 4.3, and, by applying 
that same lemma with rj ~ k and j = 1 , the second claim from 



\ogGit)g{t)dt = -l. 
The verification that /3 € S is similar, but somewhat more involved. 



A. 4. Proofs of Theorems 2.1-2.4 



Proof of Theorem 2.1. The set H given in (10) is a subspace of L'^{<t), being the image 
of the subspace M. x A x B under the linear transformation (r, a, (3) rpg + Aa + Bf3. 
Hence, with a and /3 as in Assumption 2.2, by that assumption, 

tC e H for all r e M, where ( = pe - Aa - B(3, (41) 

and 4||tC||ct = t^I*- Now let {/„}„>o G ^{f, tC) and continue as in the proof of Theorem 
3.1 in [1]. In particular, with L„ as the log likelihood ratio for /„ vs. /o, S as the limiting 
distribution of n^/^(0„ — 0o), guaranteed to exist by the regularity of 6n, and Z ~ A/'(0, /*), 
the random vector (jn}/'^{9n — 9), Ln) converges weakly under / to {S, tZ — 1/2x^7*) and 
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the characteristic function of S factors into the product of the characteristic functions of 
5* - Z/h times that of Z/h. □ 

Proof of Theorem 2.2. Note that with C as in (41), as C € H we have that J^(/,C) C 
J-'if), and therefore, for all c> 

BUc) = {/„ e Hf,0--n'/'\\f!/' f'/X < c} 

Hence the argument for the proof of Theorem 3.2 of [1] is obtained. □ 



Proofs of Theorems 2.3 and 2.4. The application of the results of [13] and [2], as in 
the proofs of Theorems 4.1 and 4.2 in [1] apply with minimal changes. In particular, for 
any element of H given by 

\ctT:H->Bo = {xe C[0, 1] : x{0) = a;(l) = 0} be defined by 
Writing ^ as 

C = T{pg -Aa- BP) + A{Ta + a) + B{t$ + (3), 

the orthogonality provided by Assumption 2.3 and Lemma 4.1 yield A*C^ = A*A(Ta + a) 
and therefore 

a = C*C, - {C,A{P0- Aa- Bi3)/h)a, where C = 

Continuing, one may verify that the adjoint T* of T is given by a formula analogous to 
that in Lemma 5.2 of [1], and that 

\\\T*v\\l=E(^j\^du^\ ^ 

We remark that though the subspace H is not assumed to be closed in L^{a), and 
hence the projection theorem cannot be applied, as long as H contains the approach to 
/ along the 'worst case' direction C,, the proof of [1] carries through. Moreover, this holds 
true independently of the number of factors in the model, one more here than in [1]. 
The other difference between the situation here and that of [1], that B consists of the 
perturbations that approximate n^^'^{h]J'^ — h^^^) in L'^{vg) for all \6\ < 6'„ rather than 
in the weaker L^{v) sense, is handled by Assumption 2.2, which gives, in particular, that 
the critical (3 lies in B even when insisting on the stronger form of convergence. 
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